Quantitative (18) FDG PET asymmetry features improve surgical outcome prediction in refractory epilepsy


Lohith Kini1,2 $*$, Ashesh Thaker3$*$, Peter Hadar4, Russell T Shinohara5, Mesha-Gay Brown6, Jacob Dubroff7, and Kathryn A Davis2,4

1 Department of Bioengineering, University of Pennsylvania, Philadelphia PA 19104;
2 Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia PA 19104;
3 Department of Radiology, Neuroradiology, University of Colorado Hospital, Aurora CO 80045;
4 Department of Neurology, Hospital of the University of Pennsylvania, Philadelphia PA 19104;
5 Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia PA 19104;
6 Department of Neurology, University of Colorado Hospital, Aurora CO 80045;
7 Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia PA 19104.

*Correspondence: lkini@mail.med.upenn.edu

Abstract

Objective: Recent studies suggest that quantitative FDG-PET features are strong predictors of surgical outcome in refractory epilepsy, independent of anatomic findings on MRI. In this study, we investigate the added value of quantitative FDG-PET features combined with clinical variables, such as EEG and PET interpretations, to predict surgical outcomes as compared to clinical variables alone.
Methods: A predictive model of surgical outcome was defined using a random forest classifier trained on quantitative features in 96 patients with drug refractory epilepsy from the Hospital of the University of Pennsylvania Epilepsy surgery program (2003 to 2016). Quantitative features were calculated from asymmetry and z-score maps of PET images compared to a normative population, using Neurostat 3D-SSP and Advanced Normalization Tools (ANTs). The model was trained to predict surgical outcome in a developmental cohort (n=77) and assessed on a separate validation cohort (n=19).
Results: The best-performing model had an accuracy of 0.68 (0.73 AUC) in identifying patients with seizure recurrence (Engel IB or worse) and an accuracy of 0.69 (0.70 AUC) with predicting poor outcome following resection (Engel II or worse). In addition, the best model identified several important features that were significantly associated with outcome. This model is shared through open-source software for both research and clinical use.
Interpretation: Complex quantitative FDG-PET imaging features can predict poor surgical outcome in patients with refractor epilepsy. These initial retrospective results in a large cohort strongly suggest that using quantitative imaging features from regions in the epileptogenic network may inform the clinical decision making process.

Keywords

3D-SSP, Asymmetry, PET, Drug Resistant Epilepsy, Surgical Outcome, Imaging Features, Machine Learning, Open source

Introduction

Epilepsy affects 65 million people in the world, of which 20-40% are refractory to medical therapy [1]. These refractory patients are at higher risk for epilepsy-associated morbidities, including premature death, seizure-related injuries, psychosocial dysfunction, and general reduction in quality of life measures [2]. Resective surgical therapy has been the mainstay of therapy for these patients, but up to 30% of surgical patients have no significant improvement after surgery, and many patients continue to experience debilitating seizures years after surgery. Accurate identification of structural lesions or metabolic abnormalities on neuroimaging is thought to be a major contributing factor to seizure-free outcome following surgery; despite significant advances in imaging techniques, the goal of using neuroimaging in epilepsy to monitor therapeutic interventions, identify biomarkers, and predict post-surgical outcomes has yet to be fully realized [3].

Typical neuroimaging protocols for epilepsy presurgical planning at tertiary centers include interictal (18)FDG-PET, combined with MRI, seizure semiology, neuropsychological testing, scalp and intra-cranial electrocorticography, to better identify and map the seizure onset zone to be resected. While standard imaging techniques with qualitative interpretation are an important element of the clinical workup for surgical candidates, recent investigation using quantitative imaging features derived from PET has shown promise in being able to better identify patients who may not be ideal candidates for surgery; some studies have identified important temporal and extratemporal regions on functional and structural imaging that predict prognosis[4-7]. Since these features are quantitative and computed semi-automatically, precise measurements of important imaging biomarkers can be acquired without solely relying on qualitative PET imaging reads, potentially improving pre-surgical planning.

However, prior studies have not built models which are readily translated into clinical practice, since very few studies have demonstrated the use of robust, machine learning-based, semi-automated, generalizable models that can be used for prospective studies. In addition, many prior studies examined cohorts that were either small in sample size or focused exclusively on a subpopulation (such as patients with TLE or unilateral hippocampal sclerosis), minimizing the broad applicability of these findings. The future of neuroimaging in epilepsy will require harnessing both advances in machine learning and publicly-shared big data [8] to encourage multi-site collaboration among clinicians and researchers to ultimately enable better prediction of surgical outcome. This has shown success in other neurological disorders such as spinal muscular atrophy [9]. In pursuit of this aim, we identified a set of quantitative FDG-PET imaging features predictive of surgical outcome, relying on asymmetry measurements and cortical z-scores (with the use of Neurostat®10). Additionally, we quantified the importance of these quantitative features relative to traditional qualitative clinical features, such as EEG and nuclear medicine radiologist interpretation of PET images, in improving the accuracy of predicting poor surgical outcome. Models built using these features are then validated on a separate data set. Finally, all computational models used to compute these features are made publicly available via open-source software to help clinicians and researchers incorporate valuable quantitative PET information in their clinical decision-making process.

Patients And Methods

Patients

This study was approved by University of Pennsylvania (HUP) Institutional Review Board. Subjects were screened using electronic records, and included in this study based on the following criteria: epilepsy surgery at HUP between 2003 and 2016; clinical evaluation by members of the Epilepsy Surgery program; and available histopathologic and neuroradiology assessment by board certified pathologists and radiologists, respectively. Patients were excluded if they lacked preoperative FDG-PET imaging, scalp EEG information, or radiology reads (Figure 1). Clinical data for this study were obtained from electronic and paper medical records and included the following: gender, date of birth, date of epilepsy surgery, date of onset of epilepsy, scalp EEG findings, neuroimaging findings, seizure lateralization and localization, region of surgical resection, and surgical outcome by Engel classification. Extensive presurgical evaluation, as part of the selection process for potential surgery at HUP, included detailed clinical examinations, scalp and invasive (subdural and depth electrode) electroencephalography, magnetic resonance imaging (MRI), and FDG-PET. Histological examination was completed after surgical resection. Seizure outcome for each patient was determined by epileptologists blinded to the results of this study (M-G.B., K.A.D.). For the purposes of this study, the binary seizure outcome variable was based on surgical outcome classified according to the Engel system.

FDG-PET Acquisition and Processing

Preoperative brain FDG-PET imaging was performed utilizing a standard clinical protocol at our institution, which followed procedure guidelines set forth by the Society of Nuclear Medicine [11] . Only CT attenuation corrected images were utilized for subsequent processing, and DICOM images were resampled into a 128 x 128 matrix, if not initially reconstructed at this size. PET data were then analyzed using Neurostat/3D-SSP12 which applied spatial normalization followed by pixel-by-pixel comparison to a reference database. The output of Z-scores for individual voxels in the brain volume was converted to Talairach space for subsequent analysis. Regions of interest in the cerebellum were removed from analysis because of high signal noise in the PET maps.

Quantitative Imaging Pre-Processing

All PET data were preprocessed (Figure 2) using Neurostat 3D-SSP [10] and the Advanced Normalization Tools (ANTs) framework [13] for feature map computation. In order to measure quantitative imaging features in different regions of the brain, the Automated Anatomic Labeling (AAL) parcellation scheme14 was used to segment all PET images. In order to compute asymmetry features, the ANTs framework was used to register PET images themselves reflected across the sagittal plane tp compute voxel-based and regional-based (from different AAL parcellations) asymmetry indices. In summary, if the original PET image is defined as P and the mirrored image registered (affine) to P is called , then the asymmetry index was defined as

$$AI\big[x\big] = \frac{P-\bar{P}}{P+\bar{P}} $$

for all voxels $x \in$ image domain of $P$.

The regional PET asymmetry was defined as $$ \text{Regional AI} = \frac{P[a] - \bar{P}[a]}{P[a] + \bar{P}[a]}$$ for all regions $a \in \text{AAL}$ parcellations.

We also included the asymmetry index of the average 3D-SSP z scores in ipsilateral and contralateral regions obtained from Neurostat. These features were computed and collated for both validation and testing cohorts.

Clinical Variables

All models were trained on quantitative features, as well as the following clinical variables: scalp EEG localization, presence of interictal FDG-PET hypometabolism, and the binary presence of a structural lesion on anatomic imaging. Clinical imaging variables were based on qualitative nuclear medicine and neuroradiology interpretations, respectively. Extent of PET hypometabolism extent was coded as normal, restricted foci, subtle foci, multifocal, and diffuse presence of hypometabolism. Scalp EEG was coded based on lobe and hemisphere of ictal patterns. Patients with both temporal and extra-temporal ictal onset were coded separately (as LTL+ or RTL+) from patients with solely temporal onset.

Outcome Variable

The most recent outpatient visit was used to calculate the binary seizure recurrence outcome variable. Three separate methods of outcome were then created to measure predictive ability across the full breadth of outcome stratification: Classification 1 coded the outcome variable as 1, for seizure recurrence (Engel IB-D, II, III, IV), or 0, if entirely seizure free (Engel IA). Classification 2 coded the outcome variable as 0, for favorable surgical outcome (Engel I), or 1, for poor surgical outcome (Engel II-IV). Classification 3 coded the outcome variable as 1, for non-ideal favorable surgical outcome (Engel IB-D), 0, for seizure freedom (Engel IA). All other patients (Engel II – IV) were excluded for the third study, enabling better stratification among favorable outcomes. All patients were broken up into developmental and validation cohorts by random selection across the entire cohort to avoid selection bias, such as technological advances to the PET camera over the last 10 years. The developmental cohort was used for model training and feature selection, while the validation cohort was chosen at random to represent roughly 20% of the total patient cohort to estimate the true accuracy of the model in predicting surgical outcome (Figure 1).

Statistical Analysis and Machine Learning

Pearson’s chi-squared tests were used to compare clinical variables to outcomes for categorical variables: resected region, MRI lesion status, pathology, and qualitative FDG-PET read. Three models were computed to measure the added benefit of including quantitative imaging features: clinical variables alone (Model A), clinical variables combined with features derived from Neurostat-SSP alone (Model B), and clinical variables combined with features derived from asymmetry features alone (Model C). The clinical variables were scalp EEG localization, lesion status based on MRI read, and qualitative PET read (extent of interictal hypometabolism). All models were random forest classifiers using optimal number of decision tree estimators, splitting at every node based on Gini importance [15]. Random forests (RF) are a combination of decision tree classifiers that minimize generalization error by splitting on randomly selected features. A pre-analysis variable selection step was performed to create a short list of quantitative imaging biomarkers from the large set of features described in the quantitative imaging pre-processing section. Two steps of feature selection (primary and validation) were used, with features that were mutually selected in both steps constituting the final shortlist of features. The first step selected the features that were ranked highly by the F test. The second step used recursive feature elimination using a RF (maximum leaf depth of 2) to identify features that were important across a 5-fold cross-validation scheme. This method of feature selection had been used in a previous study that applied machine learning models to diagnose Alzheimer’s disease to avoid overfitting and identify useful imaging features from a larger set of predictors [16].

Cross-validation and sensitivity/specificity analyses were performed on the developmental cohort using Python and R. ROC analysis was performed using the pROC package in R for applying the final trained models to the validation cohort. Confidence intervals were reported for both out of bag accuracy and 5-fold cross validation AUC difference estimates between Model A and Models B and C in the development cohort. Additionally, the final selected features were then fit using a univariate ordinal logistic regression model for outcome prediction to identify which quantitative features and cortex regions were important variables. Analysis of correlation between these features and outcome was performed with the use of an ordinal scale adapted from the Engel classification, ranging from 1 (best outcome, class IA) to 7 (worst outcome, class IV)17. This outcome measure does not require arbitrary dichotomization of outcome and keeps valuable information in clinical outcome. All correlations were examined with a significance threshold set at p < 0.05. No correction was applied for multiple corrections to avoid increasing type II error, since this study sought to measure the added value of potential quantitative neuroimaging predictors of outcome that could be further explored in future prospective studies. In addition, the selective feature selection process eliminated many redundant features minimizing risk of type I error. Finally, accuracy, sensitivity, and specificity of the best-performing models were measured on the validation cohort. A DeLong test was used to compare the validation ROCs of Models B and C to Model A.

Results

Patient Demographics

Table 1 shows a summary of the patient demographics in the two separate patient cohorts. There were 65 (68%) patients who were Engel I, 20 (21%) patients who were Engel II, 6 (6%) patients who were Engel III, and 5 (5%) patients who were Engel IV. Of the 65 patients who were Engel I, 32 patients were Engel IA and 33 patients were Engel IB, IC, or ID. Outcome was determined from the last clinical encounter (mean follow-up of 5.85 ± 3.77 years after surgery). Figure 3 shows 3 sample patients and their feature maps (3D-SSP and voxel-level asymmetry maps). In the full patient cohort, surgery was performed ipsilateral to the hypometabolic temporal lobe in 75 (78 %) patients (27 were Engel IA, $\chi^2$ = 1.45, $p$ = 0.22). The remaining patients had normal PET reads (4 were Engel IA). Among the patients who had favorable outcome (Engel IA), 16 patients had left temporal lobe resections (LTL) and 16 patients had right temporal lobe (RTL) resections. Among the patients who had seizure recurrence (Engel IB-ID, II, III, or IV), 31 had LTL, 26 had RTL, and the rest had extra-temporal right hemispheric resections ($\chi^2$ = 4.97, $p$ = 0.17). On visual analysis of PET-FDG scans in patients with favorable outcome, 25 scans displayed restricted or subtle temporal lobe hypometabolism, 3 scans showed diffuse or multilobar hypometabolism, and 4 scans were normal. Among patients with poor outcome, 36 scans displayed restricted or subtle temporal lobe hypometabolism (16 left and 12 right), 13 scans showed diffuse or multilobar hypometabolism, and 15 scans were normal ($\chi^2$ = 7.98, $p$ = 0.05). Of the 61 patients that had restricted or subtle hypometabolism, 38 had mesial temporal sclerosis (MTS), 11 had gliosis, 3 had focal cortical dysplasia (FCD), 3 had dual pathology (MTS and malformation of cortical development), 3 had low-grade tumors or vascular malformations and 2 were normal. Of the 35 patients that did not have restricted or subtle hypometabolism, 19 had MTS, 6 had gliosis, 3 had FCD, 3 had dual pathology, 1 had low-grade tumors or vascular malformations and 3 were normal ($\chi^2$ = 1.35, p = 0.93). Patients with complete seizure freedom were more likely to have lesions than those who fared poorly ($\chi^2$ = 1.6363, $p$ = 0.20). In patients with either subtle or restricted hypometabolism on PET, 17 of 39 (44%) lesional patients had seizure freedom and only 8 of 22 (36%) non-lesional patients had seizure freedom ($\chi^2$ = 0.0808, $p$ = 0.78). When comparing MRI and PET reads, lesional patients tended to be more likely to have restricted lobar hypometabolism (39 of 55; 71%) as compared to non-lesional patients (22 of 41; 54 %) ($\chi^2$ = 2.31, $p$ = 0.13).


In [ ]:
from results import *
%matplotlib inline

In [ ]:
compute_table1()

Classification 1: Prediction of seizure recurrence: Engel IA vs. IB-D, II-IV

Feature Selection


In [ ]:
Classification1_SSP_features, Classification1_ASYMM_features = classification1_feature_selection()

In [ ]:
len(Classification1_SSP_features), len(Classification1_ASYMM_features)

Determine number of estimators


In [ ]:
n_estimatorsA, n_estimatorsB, n_estimatorsC = classification1_num_estimators(
    Classification1_SSP_features, 
    Classification1_ASYMM_features
)
print n_estimatorsA, n_estimatorsB, n_estimatorsC

Feature List and Importances (Table 2)


In [ ]:
results = classification1_feature_importances(
    Classification1_SSP_features, 
    Classification1_ASYMM_features
)
display(results)

Measure cross-validation scores


In [ ]:
resultsSSP, resultsASYMM = classification1_cross_validation(
    Classification1_SSP_features, 
    Classification1_ASYMM_features, 
    n_estimatorsA, 
    n_estimatorsB, 
    n_estimatorsC
)
cm = sns.light_palette("blue",as_cmap=True)

In [ ]:
resultsSSP.style.background_gradient(cmap=cm,high=1.0)

In [ ]:
resultsASYMM.style.background_gradient(cmap=cm,high=1.0)

Validation on testing set (Table 4)


In [ ]:
results = classification1_validation(
    Classification1_SSP_features, 
    Classification1_ASYMM_features, 
    n_estimatorsA, 
    n_estimatorsB, 
    n_estimatorsC
)
cm = sns.light_palette("green",as_cmap=True)
results.style.background_gradient(cmap=cm,high=1.0,low=0.0)

Classification 2: Prediction of poor surgical outcome: Engel I vs. II-IV

Feature selection


In [ ]:
Classification2_SSP_features, Classification2_ASYMM_features = classification2_feature_selection()
len(Classification2_SSP_features), len(Classification2_ASYMM_features)

Determine number of estimators


In [ ]:
n_estimatorsA, n_estimatorsB, n_estimatorsC = classification2_num_estimators(
    Classification2_SSP_features, 
    Classification2_ASYMM_features
)
print n_estimatorsA, n_estimatorsB, n_estimatorsC

Feature List and Importances (Table 3)


In [ ]:
results = classification2_feature_importances(
    Classification2_SSP_features, 
    Classification2_ASYMM_features
)
display(results)

Measure cross-validation scores


In [ ]:
resultsSSP, resultsASYMM = classification2_cross_validation(
    Classification2_SSP_features, 
    Classification2_ASYMM_features, 
    n_estimatorsA, 
    n_estimatorsB, 
    n_estimatorsC
)
cm = sns.light_palette("blue",as_cmap=True)

In [ ]:
resultsSSP.style.background_gradient(cmap=cm,high=1.0)

In [ ]:
resultsASYMM.style.background_gradient(cmap=cm,high=1.0)

Validation on testing set (Table 4)


In [ ]:
results = classification2_validation(
    Classification2_SSP_features, 
    Classification2_ASYMM_features, 
    n_estimatorsA, 
    n_estimatorsB, 
    n_estimatorsC
)
cm = sns.light_palette("green",as_cmap=True)
results.style.background_gradient(cmap=cm,high=1.0,low=0.0)

Classification 3: Prediction of seizure recurrence in Engel I: Engel IA vs. IB-D

Feature selection


In [ ]:
Classification3_SSP_features, Classification3_ASYMM_features = classification3_feature_selection()

In [ ]:
len(Classification3_SSP_features), len(Classification3_ASYMM_features)

Determine number of estimators


In [ ]:
n_estimatorsA, n_estimatorsB, n_estimatorsC = classification3_num_estimators(
    Classification3_SSP_features, 
    Classification3_ASYMM_features
)
print n_estimatorsA, n_estimatorsB, n_estimatorsC

Feature List and Importances


In [ ]:
results = classification3_feature_importances(
    Classification3_SSP_features, 
    Classification3_ASYMM_features
)
display(results)

Measure cross-validation scores


In [ ]:
resultsSSP, resultsASYMM = classification3_cross_validation(
    Classification3_SSP_features, 
    Classification3_ASYMM_features, 
    n_estimatorsA, 
    n_estimatorsB, 
    n_estimatorsC
)
cm = sns.light_palette("blue",as_cmap=True)

In [ ]:
resultsSSP.style.background_gradient(cmap=cm,high=1.0)

In [ ]:
resultsASYMM.style.background_gradient(cmap=cm,high=1.0)

Validation on testing set (Table 4)


In [ ]:
results = classification3_validation(
    Classification3_SSP_features, 
    Classification3_ASYMM_features, 
    n_estimatorsA, 
    n_estimatorsB, 
    n_estimatorsC
)
cm = sns.light_palette("green",as_cmap=True)
results.style.background_gradient(cmap=cm,high=1.0,low=0.0)

In [ ]: